Technion Researchers Develop an Innovative Approach for Identifying Limitations and “Hallucinations” in Artificial Intelligence Models

Large language models are an innovative tool transforming a wide range of tasks, including translation, text comprehension, and code generation. However, these models also have shortcomings that require improvement, including biases, disregard for instructions, and “hallucinations” (i.e. the generation of inaccurate information).

These challenges are a major focus of the research group led by Dr. Haggai Maron from the Andrew and Erna Viterbi Faculty of Electrical and Computer Engineering at the Technion, in collaboration with researchers from other universities and NVIDIA. Recently, three papers by the group were accepted to the most prestigious conferences in computational learning: ICLR 2026, NeurIPS 2025, and AAAI 2026. The papers were led by Ph.D. student Guy Bar-Shalom (co-advised by Prof. Ran El-Yaniv) and postdoctoral researcher Dr. Fabrizio Frasca, in collaboration with Dr. Yftah Ziser (University of Groningen and NVIDIA).

מימין לשמאל: ד"ר פבריציו פרסקה, ד"ר חגי מרון וגיא בר שלום
In the photo, from left to right: Guy Bar-Shalom, Dr. Haggai Maron, Fabrizio Frasca

Dr. Maron and his team propose a new research direction for identifying failures and flaws in text generated by large language models. Instead of attempting to fully understand how the model operates at every level (something that remains beyond the current reach of the research community), the authors suggest a more pragmatic, inexpensive, and faster approach. Their method is based on building and deploying new machine-learning systems on top of the models’ internal computations, in a way that leverages the complex internal structure of those computations. The goal is for these learning systems to detect and utilize hidden information embedded within these computations, even if humans do not fully understand it. The key achievement is demonstrating the possibility of externally and inexpensively monitoring and diagnosing risks. This approach enables users to supervise the model, predict its behavior, and control it without fully understanding the entire mechanism.

The research addresses one of the most critical challenges of the AI era: how to identify when a large language model is making mistakes, fabricating information, or deviating from expected behavior. The methods developed at the Technion provide rapid and effective diagnostics that do not depend on understanding the entire mechanism or the model’s training process.

The new approach opens broad practical possibilities, including the development of warning systems, quality assurance tools, and safety standards for language models used in medicine, research, education, regulation, and other fields. This marks an important step toward the responsible integration of artificial intelligence into critical systems and toward making AI tools more reliable.

This series of studies is part of a broader research program in Dr. Maron’s laboratory, where the group investigates how patterns can be learned from new types of data that can be extracted from trained models, such as their weights and signals used during training.

For further reading, click here